Author Profiling using Stylometric and Structural Feature Groupings
نویسندگان
چکیده
In this paper we present an approach for the task of author profiling. We propose a coherent grouping of features combined with appropriate preprocessing steps for each group. The groups we used were stylometric and structural, featuring among others, trigrams and counts of twitter specific characteristics. We address gender and age prediction as a classification task and personality prediction as a regression problem using Support Vector Machines and Support Vector Machine Regression respectively on documents created by joining each user’s tweets.
منابع مشابه
Author Profiling using Complementary Second Order Attributes and Stylometric Features
In this paper we present an approach for the task of author profiling. We propose a modular framework, extracting two main group of features, combined with appropriate preprocessing, implementing Support Vector Machines for classification. The two main groups we used were stylometric and discriminative, featuring trigrams on one hand and complementary-weighted Second Order Attributes on the oth...
متن کاملAn Author Profiling Approach Based on Language-dependent Content and Stylometric Features
We describe the approach that we submitted to the 2015 PAN competition [5] for the author profiling task. The task consists in predicting some attributes of an author analyzing a set of his/her Twitter tweets. We consider several sets of stylometric and content features, and different decision algorithms: we use a different combination of features and decision algorithm for each language-attrib...
متن کاملUsing Textual Transcripts of Parliamentary Interventions for Profiling Portuguese Politicians
This paper presents an experimental study on the subject of profiling political actors through textual transcriptions of their parliamentary interventions. Supervised learning techniques were used to learn models, which attempt to classify Portuguese politicians according to their gender, their age group, or their political affiliation and orientation. Experiments were made using different type...
متن کاملDetermining Window Size from Plagiarism Corpus for Stylometric Features
The sliding window concept is a common method for computing a profile of a document with unknown structure. This paper outlines an experiment with stylometric word-based feature in order to determine an optimal size of the sliding window. It was conducted for a vocabulary richness method called ’average word frequency class’ using the PAN 2015 source retrieval training corpus for plagiarism det...
متن کاملAuthor Identification using Stylometric Features
In this work we present a strategy for author identification for documents written in Portuguese. It takes into account a writer-independent model which reduces the pattern recognition problem to a single model and two classes, hence, makes it possible to build robust system even when few genuine samples per writer are available. We also introduce a stylometric feature set, which is based on th...
متن کامل